Search results for "data compression"
showing 10 items of 99 documents
Text Compression Using Antidictionaries
1999
International audience; We give a new text compression scheme based on Forbidden Words ("antidictionary"). We prove that our algorithms attain the entropy for balanced binary sources. They run in linear time. Moreover, one of the main advantages of this approach is that it produces very fast decompressors. A second advantage is a synchronization property that is helpful to search compressed data and allows parallel compression. Our algorithms can also be presented as "compilers" that create compressors dedicated to any previously fixed source. The techniques used in this paper are from Information Theory and Finite Automata.
Suffix Array Construction on Multi-GPU Systems
2019
Suffix arrays are prevalent data structures being fundamental to a wide range of applications including bioinformatics, data compression, and information retrieval. Therefore, various algorithms for (parallel) suffix array construction both on CPUs and GPUs have been proposed over the years. Although providing significant speedup over their CPU-based counterparts, existing GPU implementations share a common disadvantage: input text sizes are limited by the scarce memory of a single GPU. In this paper, we overcome aforementioned memory limitations by exploiting multi-GPU nodes featuring fast NVLink interconnects. In order to achieve high performance for this communication-intensive task, we …
The Myriad Virtues of Wavelet Trees
2009
Wavelet Trees have been introduced in [Grossi, Gupta and Vitter, SODA '03] and have been rapidly recognized as a very flexible tool for the design of compressed full-text indexes and data compressors. Although several papers have investigated the beauty and usefulness of this data structure in the full-text indexing scenario, its impact on data compression has not been fully explored. In this paper we provide a complete theoretical analysis of a wide class of compression algorithms based on Wavelet Trees. We also show how to improve their asymptotic performance by introducing a novel framework, called Generalized Wavelet Trees, that aims for the best combination of binary compressors (like,…
On parsing optimality for dictionary-based text compression—the Zip case
2013
Dictionary-based compression schemes are the most commonly used data compression schemes since they appeared in the foundational paper of Ziv and Lempel in 1977, and generally referred to as LZ77. Their work is the base of Zip, gZip, 7-Zip and many other compression software utilities. Some of these compression schemes use variants of the greedy approach to parse the text into dictionary phrases; others have left the greedy approach to improve the compression ratio. Recently, two bit-optimal parsing algorithms have been presented filling the gap between theory and best practice. We present a survey on the parsing problem for dictionary-based text compression, identifying noticeable results …
Image Processing Chain For Digital Still Cameras Based On The Simpil Architecture
2005
The new generation of wireless devices herald the development of products for integrated portable image and video communication requiring to image and video applications high computing performance. Portable MultiMedia Supercomputers (PMMS), a new class of architectures, allow to combine high computational performance, needed by multimedia applications, and a big energy efficiency, needed by portable devices. Among PMMS, the SIMPil (SIMD processor pixel) architecture satisfies the above requirements, especially with video and digital images processing tasks. In this paper we, exploit the SIMPil computation and throughput efficiency to implement the whole image processing chain of a digital s…
The rightmost equal-cost position problem.
2013
LZ77-based compression schemes compress the input text by replacing factors in the text with an encoded reference to a previous occurrence formed by the couple (length, offset). For a given factor, the smallest is the offset, the smallest is the resulting compression ratio. This is optimally achieved by using the rightmost occurrence of a factor in the previous text. Given a cost function, for instance the minimum number of bits used to represent an integer, we define the Rightmost Equal-Cost Position (REP) problem as the problem of finding one of the occurrences of a factor whose cost is equal to the cost of the rightmost one. We present the Multi-Layer Suffix Tree data structure that, for…
Image Compression by 2D Motif Basis
2011
Approaches to image compression and indexing based on extensions to 2D of some of the Lempel-Ziv incremental parsing techniques have been proposed in the recent past. In these approaches, an image is decomposed into a number of patches, consisting each of a square or rectangular solid block. This paper proposes image compression techniques based on patches that are not necessarily solid blocks, but are affected instead by a controlled number of undetermined or don't care pixels. Such patches are chosen from a set of candidate motifs that are extracted in turn from the image 2D motif basis, the latter consisting of a compact set of patterns that result from the autocorrelation of the image w…
Imaging of Temporomandibular Joint: Approach by Direct Volume Rendering
2014
Background: The purpose of this study was to conduct a morphological analysis of the temporomandibular joint, a highly specialized synovial joint that permits movement and function of the mandible. Materials and Methods: We have studied the temporom-andibular joint anatomy, directly on the living, from 3D images obtained by medical imaging Computed Tomography and Nuclear Magnetic Resonance acquisition, and subsequent re-engineering techniques 3D Surface Rendering and Volume Rendering. Data were analysed with the goal of being able to isolate, identify and distinguish the anatomical structures of the joint, and get the largest possible number of information utilizing software for post-proces…
Analog joint source-channel Multiple Description coding scheme over AWGN parallel channels
2011
We propose a low complexity analog joint source channel coding Multiple Description (MD) scheme for transmitting the symbols of a Gaussian source across a pair of independent AWGN channels. The outputs of these channels have each a separated receiver, whereas a third receiver has both outputs available. At the transmitter side, a pair of bandwidth-reduction analog mappings are used for joint source-channel coding. The presented scheme has the inherent advantage over digital MD schemes based on separation, that coding and decoding can be performed by using a single-letter (or symbol), a strategy that is very suitable for applications where latency originated by the digital compression and th…
A New Combinatorial Approach to Sequence Comparison
2008
In this paper we introduce a new alignment-free method for comparing sequences which is combinatorial by nature and does not use any compressor nor any information-theoretic notion. Such a method is based on an extension of the Burrows-Wheeler Transform, a transformation widely used in the context of Data Compression. The new extended transformation takes as input a multiset of sequences and produces as output a string obtained by a suitable rearrangement of the characters of all the input sequences. By using such a transformation we give a general method for comparing sequences that takes into account how much the characters coming from the different input sequences are mixed in the output…